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Abstract 

Noise  models  are  crucial  for  designing  image  restoration  algorithms,  generating  synthetic 
training  data,  and  predicting  algorithm  performance.  However,  to  accomplish  any  of 
these  tasks,  an  estimate  of  the  degradation  model  parameters  is  essential.  In  this  paper 
we  describe  a  parameter  estimation  algorithm  for  a  morphological,  binary  image  degra¬ 
dation  model.  The  inputs  to  the  estimation  algorithm  are  i)  the  degraded  image,  and  ii) 
information  regarding  the  font  type  (italic,  bold,  serif,  sans  serif).  We  simulate  degraded 
images  and  search  for  the  optimal  parameter  by  looking  for  a  parameter  value  for  which 
the  neighborhood  pattern  distributions  in  the  simulated  image  and  the  given  degraded 
image  are  most  similar.  The  parameter  space  is  searched  using  the  Nelder-Mead  downhill 
simplex  algorithm.  We  use  the  p- value  of  the  Kolmogorov- Smirnov  test  for  the  measure 
of  similarity  between  the  two  neighborhood  pattern  distributions.  We  show  results  of  our 
algorithm  on  degraded  document  images. 
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Abstract 

Noise  models  are  crncial  for  designing  image  restoration  algorithms,  generating  syn¬ 
thetic  training  data,  and  predicting  algorithm  performance.  However,  to  accomplish  any 
of  these  tasks,  an  estimate  of  the  degradation  model  parameters  is  essential.  In  this 
paper  we  describe  a  parameter  estimation  algorithm  for  a  morphological,  binary  image 
degradation  model.  The  inpnts  to  the  estimation  algorithm  are  i)  the  degraded  image, 
and  ii)  information  regarding  the  font  type  (italic,  bold,  serif,  sans  serif).  We  simnlate 
degraded  images  and  search  for  the  optimal  parameter  by  looking  for  a  parameter  valne 
for  which  the  neighborhood  pattern  distribntions  in  the  simnlated  image  and  the  given 
degraded  image  are  most  similar.  The  parameter  space  is  searched  nsing  the  Nelder- 
Mead  downhill  simplex  algorithm.  We  nse  the  p-valne  of  the  Kolmogorov-Smirnov  test 
for  the  measnre  of  similarity  between  the  two  neighborhood  pattern  distribntions.  We 
show  resnlts  of  onr  algorithm  on  degraded  docnment  images. 
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1  Introduction 


Numerous  document  image  degradation  models  have  been  proposed  in  the  literatnre 
[1,  7,  8].  However,  prior  to  nsing  these  models,  it  is  important  to  i)  validate  the  models 
—  that  is,  verify  that  the  simnlations  generated  by  these  models  are  similar  to  real-world 
examples,  and  ii)  provide  algorithms  for  estimating  the  model  parameters  from  real 
samples.  The  issne  of  validation  was  addressed  by  Kannngo  et  al.  [5,  6]  by  converting  the 
validation  problem  into  a  hypothesis  testing  problem  and  then  nsing  a  permntation  test 
to  test  the  nnll  hypothesis  that  a  synthetic  sample  of  degraded  characters  and  another 
sample  of  real  degraded  characters  come  from  the  same  nnderlying  distribntion.  Lopresti 
et  al.  [fO]  instead  proposed  to  stndy  the  differences  in  the  error  characteristics  of  the 
OCR  ontpnt  for  the  real  and  synthetic  samples.  This  method,  however,  considers  the 
degradation  conpled  with  the  OCR  system  and  not  jnst  the  degradation  process. 

The  issne  of  model  parameter  estimation  has  been  stndied  to  a  lesser  extent.  Kannngo 
and  Haralick  [4]  reported  resnlts  of  some  preliminary  experiments  that  they  condncted 
to  estimate  the  degradation  model  parameters  nsing  an  objective  fnnction  based  on  the 
power  fnnction.  They  assnmed  that  that  an  ideal  docnment  image  and  the  corresponding 
degraded  image  were  given.  Baird  [2]  nsed  the  same  power  fnnction  approach  to  estimate 
the  parameters  of  another  physics-based  degradation  model,  and  Snral  and  Das  [13] 
estimated  the  parameters  of  a  two-state  Markov  chain  docnment  degradation  model  nsing 
the  power  fnnction  approach.  The  drawback  of  all  the  above  estimation  approaches  is 
that  they  assnme  that  the  degraded  image  and  the  ideal  image  are  perfectly  aligned  and 
that  character- level  geometric  gronndtrnth  (bonnding  boxes)  is  available.  This,  however, 
is  not  easy  to  achieve  since  pixel-level  alignment  of  documents  with  arbitrary  warping 
present,  dne  to  changes  in  printer  and  scanner  speeds,  is  difficnlt.  A  preliminary  version 
of  this  paper  appeared  in  [9]. 

In  this  paper  we  propose  a  parameter  estimation  algorithm  that  does  not  reqnire  the 
degraded  and  ideal  images  to  be  aligned  and  does  not  reqnire  character-level  geometric 
gronndtrnth  either.  The  algorithm  is  based  on  compnting  differences  between  the  dis- 
tribntions  of  neighborhood  patterns  in  the  degraded  and  synthetic  images.  In  Section  2 
we  describe  onr  docnment  degradation  model.  We  ontline  the  estimation  algorithm  in 
Section  4  and  provide  simnlation  resnlts  in  Section  5. 

2  The  Morphological  Document  Degradation  Model 

In  this  section  we  briefly  describe  a  docnment  degradation  model  for  the  local  degradation 
that  are  introdnced  when  docnments  are  printed,  scanned  and  digitized  [5,  7,  8]. 

The  model  acconnts  for  (i)  the  pixel  inversion  (from  foregronnd  to  backgronnd  and 
vice  versa)  that  occnrs  independently  at  each  pixel  dne  to  light  intensity  flnctnations, 
pixel  sensitivity,  and  thresholding  level,  and  (ii)  the  blnrring  that  occnrs  dne  to  the 
point-spread  fnnction  of  the  optical  system  of  the  scanner.  We  model  the  probability  of 
a  backgronnd  pixel  flipping  as  an  exponential  fnnction  of  its  distance  from  the  nearest 
bonndary  pixel.  The  parameter  Oq  is  the  initial  valne  for  the  exponential,  and  the  decay 
speed  of  the  exponential  is  controlled  by  the  parameter  a.  The  foregronnd  and  backgronnd 
4-neighbor  distance  are  compnted  nsing  a  standard  distance  transform  algorithm  [3]  . 


1 


The  flipping  probabilities  of  the  foreground  pixels  are  similarly  controlled  by  /?o  and  /?. 
The  parameter  ry  is  the  constant  probability  of  flipping  for  all  pixels.  Finally,  the  last 
parameter  k,  which  is  the  size  of  the  disk  used  in  the  morphological  closing  operation  [3], 
accounts  for  the  correlation  introduced  by  the  point-spread  function  of  the  optical  system. 

The  degradation  model  thus  has  six  parameters:  0  =  (ry,  ao,  a, /?o, /3,  &).  These  pa¬ 
rameters  are  used  to  degrade  an  ideal  binary  image  as  follows: 

1.  Compute  the  distance  d  of  each  pixel  from  the  character  boundary. 

2.  Flip  each  foreground  pixel  with  probability 

p(0|l,  d,  ao,  a)  =  +  ry. 

3.  Flip  each  background  pixel  with  probability 

p(l|0,d, /?o,/3)  =  +  T]. 

4.  Perform  a  morphological  closing  operation  with  a  disk  structuring  element  of  di¬ 
ameter  k. 

The  application  of  the  various  steps  of  the  model  is  illustrated  in  Figure  1.  The 
procedure  described  above  works  on  bit-mapped  images.  Since  there  is  no  restriction  on 
the  size  of  the  image  that  can  be  degraded,  or  the  language  of  the  written  text,  an  entire 
document  page  image  can  be  degraded  using  this  model. 

3  Neighborhood  Pattern  Distributions 

Our  estimation  algorithm  is  based  on  the  assumption  that  if  the  degradation  parameters 
are  estimated  correctly,  the  local  degradations  in  a  simulated  image  generated  using  the 
estimated  parameters  will  look  similar  to  those  of  a  real  image.  The  way  we  capture  this 
fact  is  by  looking  at  the  distribution  of  neighborhood  patterns. 

Let  P  be  a  set  of  neighborhood  bit  patterns  and  p  be  an  arbitrary  element  in  the 
set  P.  For  example,  p  could  be  a  3  X  3  neighborhood  with  all  Is,  or  it  could  be  a 
5x5  neighborhood  with  a  1  in  the  middle  and  Os  everywhere  else.  Now  we  define  the 
neighborhood  pattern  distribution  of  an  image  R.  Let  PIr  denote  a  neighborhood  pattern 
distribution,  so  that  p[ji[p),  where  p  G  P,  is  the  number  of  times  the  pattern  p  occurs 
in  the  binary  image  R.  Using  mathematical  morphology  [3]  we  can  define  Hr{p)  more 
precisely:  Hr{p)  =  #{Pep}. 

We  conducted  three  experiments  to  study  whether  the  pattern  distributions  could 
discriminate  various  font  and  language  characteristics.  In  particular,  we  studied  whether 
the  change  in  i)  fonts,  ii)  font  size,  or  iii)  text  or  text  language  probabilities  could  be 
detected  using  the  neighborhood  pattern  distributions  in  ideal  images.  In  Figure  2  and 
Table  1  we  show  subimages  of  same  text  typeset  in  serif,  sans  serif,  bold,  and  italic 
fonts.  We  find  that  the  Kolmogorov- Smirnov  test  can  easily  detect  the  differences  in 
the  neighborhood  pattern  distributions.  In  Figure  3  and  Table  2  we  show  that  even  if 
we  change  the  font  size  of  serif  text,  the  neighborhood  pattern  distributions  are  quite 
indistinguishable.  Finally,  in  Figure  4  and  Table  3  we  show  that  if  we  replace  the  original 
text  with  another  from  the  same  source,  and  keep  the  font  characteristics  identical,  the 
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A 

(e) 

Figure  1:  Local  document  degradation  model:  (a)  Ideal  noise- free  character;  (b)  Distance 
transform  of  the  foreground;  (c)  Distance  transform  of  the  background;  (d)  Result  of 
the  random  pixel-flipping  process  (the  probability  of  a  pixel  flipping  is  p(0|d, /?,/)  = 
p(l|d,  a,  &)  =  here  a  =  /3  =  2,  ao  =  /?o  =  1);  (e)  Morphological  closing  of  the 

result  in  (d)  by  a  2  X  2  binary  structuring  element. 


Kolmogorov- Smirnov  test  cannot  detect  the  difference.  This  is  quite  an  interesting  result 
because  it  says  that  in  order  to  compare  the  noise  pattern  distributions  of  two  images, 
the  two  images  need  not  have  the  same  underlying  ideal  image. 


Table  1:  Kolmogorov- Smirnov  Test  Statistics  and  Signihcance  level  (T,  P-value). 


(r,  P-value) 

Serif 

Sans  Serif 

Serif  Bold 

Serif  Italic 

Serif 

(0.0, f.O) 

(0.f93,0.00) 

(0.120,0.00) 

(0.078,0.09) 

Sans  Serif 

(0.f93,0.00) 

(0.0,f.0) 

(0.096,0.02) 

(0.25,0.00) 

Serif  Bold 

(0.f2,0.00) 

(0.096,0.02) 

(0.0,1. 0) 

(0.19,0.00) 

Serif  Italic 

(0.078,0.09) 

(0.25,0.00) 

(0.19,0.00) 

(0.0,1. 0) 

4  The  Estimation  Algorithm 

Let  I  be  the  ideal  image  and  R  be  the  given  degraded  image.  The  problem  is  to  estimate 
the  model  parameter  6  such  that  if  we  degrade  I  with  the  model  with  parameter  hxed 
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Figure  2:  Text  typeset  in  Computer  Modern  Roman  font,  (a)  Serif  text;  (b)  Sans  Serif 
text;  (c)  Serif  bold  text;  (d)  Serif  Italic  text. 


at  we  will  get  an  image  Se  that  looks  similar  to  R.  For  our  purposes,  we  say  that 
two  images  R  and  S  are  similar  if  the  corresponding  neighborhood  pattern  distributions 
Hr  and  Hsg  are  similar.  We  use  the  Kolmogorov- Smirnov  test  [If]  to  test  the  similarity 
of  the  two  neighborhood  pattern  distributions.  Let  K S[Hr,  Hsg)  denote  the  KS  test 
p-value  for  the  null  hypothesis  that  the  two  distributions  are  same.  We  will  use  this 
p- value  as  the  objective  function  that  the  estimation  process  tries  to  maximize.  That  is, 

6  =  m&xKS{HR,Hsg).  (1) 

0 

Notice  that  Sg  is  computed  by  simulation.  Thus  the  derivatives  of  the  objective 
function  cannot  be  computed  in  closed  form.  Hence  standard  derivative  approaches  to 
maximizing  KS  are  not  applicable.  We  therefore  used  the  Nelder-Mead  derivative- free 
optimization  algorithm  [12]  to  maximize  KS.  There  is  no  reason  to  believe  that  K S  is 
unimodal  over  the  model  parameter  space;  the  Nelder-Mead  algorithm  provides  us  with 
a  local  maximum.  To  circumvent  this  problem  we  do  multiple  random  starts  and  then 
pick  the  solution  corresponding  to  the  highest  maximum  value. 
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Figure  3:  Text  in  various  font  sizes,  (a)  6pt  font;  (b)  8pt  font;  (c)  f2pt  font;  (d)  f7pt 
font. 

5  Protocol  and  Results 

We  start  with  a  400  X  400  ideal  binary  image  I  such  as  that  shown  in  Figure  5(a).  The 
given  degraded  image  R  shown  in  Figure  5(b)  was  created  using  the  model  parameters  6  = 
(0.0,  0.6, 1.5,  0.8,  2.0,  3).  The  neighborhood  pattern  set  P  was  chosen  to  be  all  the  possible 
binary  patterns  in  a  3  X  3  window.  Thus  P  has  512  patterns.  The  neighborhood  pattern 
distribution  corresponding  to  Figures  5(a)-(c)  are  shown  in  Figures  6(a)-(c).  Notice  that 
some  patterns  occur  more  frequently  than  others,  and  that  the  distributions  of  the  ideal 
and  degraded  images  are  different.  The  search  was  done  for  ao,a, /?o,/3;  ry  and  k  were 
assumed  known.  The  Nelder-Mead  algorithm  was  started  10  times  with  random  start 
locations.  The  objective  function  value  (1  —  pvalue)  is  plotted  as  a  function  of  iterations 
in  Figure  7.  The  best  optimal  solution  is  found  to  be  0  =  (0.0,0.64,1.57,0.96,2.02,3). 
In  Figure  5(c)  we  show  the  image  Rq  generated  using  the  optimal  solution  6.  Notice  that 
the  neighborhood  pattern  distribution  corresponding  to  the  estimated  image,  which  is 
shown  in  Figure  6(c),  is  quite  similar  to  the  histogram  of  the  original  degraded  image 
shown  in  Figure  6(b).  Note  that  the  ideal  image  need  not  correspond  to  the  degraded 
image.  In  fact,  one  can  use  any  other  ideal  image  that  has  i)  the  same  font  type  as  that  of 
the  degraded  image  (the  font  size  can  be  different,  however),  and  ii)  language  properties 
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similar  to  those  of  the  degraded  text. 


Table  2:  Kolmogorov- Smirnov  test  statistics  and  signihcance  level  (T,  P-valne). 


(r,  P-valne) 

6pt 

8pt 

12pt 

17pt 

6pt 

(0.0, 1.0) 

(0.053,0.472) 

(0.057,0.382) 

(0.086,0.045) 

8pt 

(0.053,0.472) 

(0.0,1. 0) 

(0.062,0.268) 

(0.076,0.101) 

12pt 

(0.057,0.382) 

(0.062,0.268) 

(0.0,1. 0) 

(0.049,0.572) 

17pt 

(0.086,0.045) 

(0.076,0.101) 

(0.049,0.572) 

(0.0,1. 0) 
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Figure  4:  Texts  in  12pt  serif  Computer  Modern  Roman  font,  (a)  A  fragment  of  text  from 
one  document,  (b)  Another  fragment  from  a  similar  document,  (c)  A  fragment  of  text 
containing  only  ‘I’s.  (d)  A  fragment  of  text  containing  only  ‘O’s. 


Table  3:  Kolmogorov- Smirnov  test  statistics  and  signihcance  level  (T,  P-value). 


(r,  P-value) 

Fig  4(a) 

Fig  4(b) 

Fig  4(c) 

Fig  4(d) 

Fig  4(a) 

(0.0,1. 0) 

(0.025,0.996) 

(0.392,0.00) 

(0.384,0.00) 

Fig  4(b) 

(0.025,0.996) 

(0.0,1. 0) 

(0.412,0.0) 

(0.404,0.00) 

Fig  4(c) 

(0.392,0.00) 

(0.412,0.0) 

(0.0,1. 0) 

(0.075,0.118) 

Fig  4(d) 

(0.384,0.00) 

(0.404,0.0) 

(0.075,0.118) 

(0.0,1. 0) 
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Figure  5:  (a)  A  typical  ideal  image.  (b)  A  degraded  image  with  parame¬ 
ters  (0.0,  0.6, 1.5,  0.8,  2.0,  3).  (c)  Image  generated  using  the  estimated  parameters 
(0.00.64,1.57,0.96,2.02,3). 
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Noise  Pattern  Histogram 

Ideal  Image 


Noise  Patterns 


(a) 

Noise  Pattern  Histogram 


Real  Image 


Noise  Patterns 


(b) 

Noise  Pattern  Histogram 


Final  Image  (best  estimate) 


Noise  Patterns 


(c) 

Figure  6:  Neighborhood  pattern  distributions  corresponding  to  Figures  5(a)-(c).  Fach 
bin  along  the  x-axis  corresponds  to  a  different  3x3  neighborhood  pattern. 
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Downhill  Simplex 


Number  of  Iterations 


lire  7:  Downhill  simplex  convergence  for  different  (random)  starting  locations. 
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6  Conclusion 


We  have  described  an  algorithm  for  estimating  the  parameters  of  a  degradation  model. 
The  algorithm  assnmes  that  we  know  or  can  estimate  the  font  type  (serif,  sans  serif, 
bold,  italic)  of  the  degraded  image  and  then  typeset  an  arbitrary  ideal  text  image  in  the 
same  font.  The  ideal  image  is  then  degraded  with  varions  parameters  of  the  degradation 
model.  For  each  parameter  valne  the  neighborhood  pattern  distribntions  of  the  ideal  and 
the  degraded  images  are  compared  nsing  the  Kolmogorov-Smirnov  test.  The  parameter 
valne  that  maximizes  the  p- valne  is  nsed  as  an  estimate  of  the  model  parameters.  The 
search  for  the  optimal  parameters  is  done  nsing  the  Nelder-Mead  algorithm. 
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